Evaluation of ETSI advanced DSR front-end and bias removal method on the Japanese newspaper article sentences speech corpus

نویسندگان

  • Satoru Tsuge
  • Shingo Kuroiwa
  • Kenji Kita
چکیده

In October 2002, European Telecommunications Standards Institute (ETSI) recommended a standard Distributed Speech Recognition (DSR) advanced front-end, ETSI ES202 050 version 1.1.1 (ES202). Many studies use this front-end in noise environments on several languages on connected digit recognition tasks. However, we have not seen the reports of large vocabulary continuous speech recognition using this front-end on a Japanese speech corpus. Since the DSR system is used on several languages and tasks, we conducted large vocabulary continuous speech recognition experiments using ES202 on a Japanese speech corpus in noise environments. Experimental results show that ES202 has better recognition performance than previous DSR front-end, ETSI ES201 050 version 1.1.2 under all conditions. In addition, we focus on the influence on recognition performance of DSR with acoustic mismatches caused by input devices. DSR employs a vector quantization (VQ) algorithm for feature compression so that the VQ distortion is increased by these mismatches. Large VQ distortion increases the speech recognition error rate. To overcome increases in VQ distortion, we have proposed the Bias Removal method (BRM) in previous work. However, this method can not be applied in real-time. Hence, we have proposed the Real-time Bias Removal Method (RBRM) in this paper. The continuous speech recognition experiments on a Japanese speech corpus show that RBRM achieves an 8.7% improvement in the error rate compared to ES202 under noise conditions (SNR=20dB with convolutional noise).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RFC 4060 RTP Payloads for ETSI DSR Codecs

This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. Abstract This document specifies RTP payload formats for enca...

متن کامل

The design of the newspaper-based Japanese large vocabulary continuous speech recognition corpus

In this paper we present the first public Japanese speech corpus for large vocabulary continuous speech recognition (LVCSR) technology, which we have titled JNAS (Japanese Newspaper Article Sentences). We designed it to be comparable to the corpora used in the American and European LVCSR projects. The corpus contains speech recordings (60 hrs.) and their orthographic transcriptions for 306 spea...

متن کامل

Internet - Draft RTP Payloads for ETSI DSR

Status of this Memo By submitting this Internet-Draft, I certify that any applicable patent or other IPR claims of which I am aware have been disclosed, and any of which I become aware will be disclosed, in accordance with RFC 3668. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute w...

متن کامل

Speaker recognition and the ETSI Standard Distributed Speech Recognition Front-End

With the advent of Wireless Application Protocol (WAP) and 2.5/3G communication systems, the mobile device has become a window to the Internet. A natural interface to this mobile device is through speech. To address this need, a new European Telecommunications Standards Institute (ETSI) standard front-end has evolved for Distributed Speech Recognition (DSR). The goal of the ETSI DSR front-end i...

متن کامل

Robust Feature Extraction for Speech Recognition by Enhancing Auditory Spectrum

The goal of this work is to improve the robustness of speech recognition systems in additive noise and real-time reverberant environments. In this paper we present a compressive gammachirp filter-bank-based feature extractor that incorporates a method for the enhancement of auditory spectrum and a shorttime feature normalization technique, which, by adjusting the scale and mean of cepstral feat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003